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The article is dedicated to modern algorithm of pronominal anaphora resolution. Anaphora resolution should 
be considered in a wider range of problems related with language ambiguity resolution, for instance: entity 
recognition, reference analysis and in general case, of course, semantic analysis of natural language text. We 
can render conclusion from stated above that anaphora resolution is possible only on semantic level of natural 
language analysis. The main purpose of this work is development of semantic heuristics for finding the most 
probable antecedent corresponding to anaphora with analysis of sentence context. The proposed algorithm 
gives about 5% improvements in comparison to the standard Mitkov algorithm. 
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PoOota mpucBa4eHa aHalli3y aIrOpUTMy poO3B’s3aHHA 3aiMeCHHUKOBO! aHadopu. Po3B’s3aHHA aHadpopu Mae 
OyTH pO3rIsHyTO B paMKax UIMpoKoro Koa MpoOseM JHTBICTHYHO! HeOHO3HAYHOCTI, HallpHKay: 
pO3Ili3HaBaHHA CyTHOCTeH TeKCTY, aHasi3 MOCHIAHb Ta, B 3arasIbBHOMY BHIayKy, CCOMaHTHYHMH aHasii3 TCKCTIB 
TIpHpowHOrO Moor. [3 3a3HadeHOrO BHC MOX%KHAa 3POOHTH BHCHOBOK, WO PpO3B's3aHHA aHAPOPH MOXKIIMBEe 
Mlle Ha CeCMaHTHYHOMY PpiBHi aHalli3y MpuposqHoi MoBH. TomoBHoro MeToroO Wei poOoTH € po3pobKa 
C€MaHTHYHO! CBPHCTHKH JIA MOWyKy HalOiIbU IMOBIPHOTO aHTellexeHTa, WO BiAMOBiZae aHadopi, 13 
3ACTOCYBAaHHAM aHasli3y KOHTeCKCTy peyeHb. 3allponOHOBaHHM aITOpuTM jae MOKpalleHHaA OM3bKO 5% 
MOPiBHAHO 31 CTaHAapTHUM asiIropuTMoM MitkKosa. 

Ks1r040Bi c10Ba: OOpoOKa TeKCTIB IIPHPOAHO!O MOBOIO, PO3B’13aHHA aHaopH, CeMaHTHYHHi aHasii3. 


PaOota NocBalleHa aHasIM3y aJIrOpHTMa pellieHia MeCTOMMeHHOM aHadoppl. Pemienwe anadopbil TOJDKHO ObITb 
paccMOTpeHO B paMKax IIMpOKOTO Kpyra MmpoOsIem JIMHTBUCTH4eCKON HeOAHO3Ha4HOCTH, HalipMep: paciio3HaBaHve 
CYIHOCTei TeKCTa, aHaJIH3 CCbUIOK HV, B OOLIeM CiIy4ae, CEMaHTHYeCKHM aHasIv3 TEKCTOB Ha CCTCCTBCHHOM SA3bIKe. 
V3 yka3aHHOT0o BBIILe MOXKHO CJesIaTb BbIBOA, YTO PellieHHe aHaopbl BO3MOXKHO TOJIBKO Ha CCMaHTH4YeCKOM YpOBHe 
aHaJI3a CCTECTBEHHOTO A3bIKa. I WaBHOH [esIbIO STON padoTbI ABIIACTCA paspaooTKa CCMaHTHYCCKOM IBPHCTHEH JIA 
TlOHcKa HavOoJIee BePOATHOTO aHTel[eeHTa, COOTBETCTBYIOMero aHadope, C HCHOJb30BAHHeEM aHasIH3a KOHTeKCTa 
Tpeqioxenni. IpennoxenHad Mogndukalua alropHtma jaeT yryaiieHHe oKoIO 5% 0 cpaBHeHHIO co 
cTaHJapTHbIM asiropuTMom MurKosa. 

Kurouesble cs10Ba: OOpadoTKa TEKCTOB Ha CCTECTBCHHOM A3bIKe, pellieHve aHadopsl, 
Ce€MaHTHYeCKHH aHalin3. 
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Introduction 


Existing rule-based algorithms for anaphora resolution based on analysis of syntax 
properties have already reached their limit in quality aspects. Statistics shows that 
probability of connecting right antecedent with anaphora is about 85-90% [1], [2]. Further 
optimization is complicated by conflicts that occur due to big number of syntax rules. 
Attempts to optimize coefficients that determine rules priorities (so that no conflicts will 
occur) lead to decrease of probability for correct anaphora resolution [3]. Possible progress 
is not significant (within 0.1-0.001%). 

Anaphora reference resolution is impossible without semantics of candidates to 
antecedent and analysis of how its semantic meaning consistent with semantic of words 
that close neighbor to anaphora. As it has been stated before, the main purpose of this 
article is creation of semantic heuristics for correct antecedent determination with usage of 
semantic meaning of sentences. We decided to do it with modification of existing methods 
through adding semantic rules into the Mitkov algorithm [2]. 

There are wide arrays of approaches for solving this problem. The modern 
approaches are: Lapin and Leass algorithm, centering algorithm, Hobb’s algorithm, Mitkov 
algorithm [1], [2]. Mitkov algorithm is known to be quite flexible and adjustable. Ordinary 
realization of Mitkov algorithm doesn’t solve dubious cases and can cause wrong answer. 
Semantic rules like “semantic triplet match” together with syntax restrictions can give us 
improvement in ambiguous cases. This helps to extend a “bottleneck” of Mitkov algorithm. 

In the next sections, we discuss our new resolution algorithm and statistics about some 
resolution. 


Mitkov algorithm 


Mitkov algorithm to pronominal anaphora can be described as a set of rules that weight 
candidates to antecedents and after that the best candidate is antecedent with greatest salience. 
The set of rules is: 

1. Definiteness: All defined nouns have weights +1, candidates that don’t have any 
defined nouns : -1; 

2. Giveness: candidates that represent the following topic: +1; 

3. Indicator words: candidates after verbs: {discuss, present, illustrate, identify, 
summarize, examine, describe, define} have +1; 

4. Lexical reiteration: repeated candidates have salience +1 if they are repeated 
once, +2 if they are repeated twice, and so on; 

5. Non-pronominal phrases: candidates that enter NP have salience +1; 

6. Collocation pattern preference: +2 to salience of candidates that have syntactic 
position the same that a pronoun; 

7. Connective pattern: in case like: “you V1 NP or con((you) V2 it) con ((you) V3 
it)’ NP candidates have +2 to it’s salience; 

8. Reminder indicator: candidates that are reminded in previous sentences have +1, 
in the same sentence : +2; 

9. Field indicator: candidates that concern the same field that antecedent have +1; 

10. Boost pronoun: candidates that have more references to pronouns have more 
salience; 

11. Syntax parallelism: in case of same syntactic position, candidates have +1; 

12. Reference indicator: in case of the most referred antecedents, they have +1. 


«LtyyHntt 1HTeeKT» 3°2012 107 


O.O. Marchenko 
3M 


Analysis and improvements of the Mitkov algorithm’s 


The Mitkov algorithm is a rule-based approach. It is based on syntax rules that can conflict 
with each other. The main "bottleneck" in this algorithm in some relations must be solved on 
semantic level but this can not be done, because we have only syntax-based rules. In this case pro- 
bability of finding right antecedent is very low. This problem can be solved only with semantic 
rules implementation. If we try to extend the set of syntax rules this could only increase conflicts 
between the rules and will lead to wrong antecedent as a result. We can also strengthen pronominal 
anaphora algorithm with semantic measurement implementation. The main advantage of the 
Mitkov algorithm is that it can be easily adjusted so that we can process a set of sentences. In our 
realization we look backwards for four sentences. 

Let us consider rule #6: Collocation match pattern. We can extend this rule adding 
some semantic sense to it. We can increase probability of choosing right antecedent 
modifying this indicator. For instance one of possible modifications can be done: we can 
see that semantic position of candidate is not strictly the same as semantic position of 
pronoun but close enough. Close enough means that semantic distance less or equal than 
value that was specified before. In our case we used semantic distance by Leacock and 
len(cl,c2) 

2* MAX 
shortest path in the taxonomy between the two concepts(words) and MAX is the depth of 
the taxonomy [4], [5]. 

Another possible modification is to create triplets in a form like: VERB verb NOUN 
nounl NOUN noun2. With this triplet we can use semantic distance for noun! and noun2. 
There can be more modifications done. They are building layer by layer so that chances of 
finding right antecedent will raise a lot. 

We can also use syntactic restrictions when composing weights for antecedents - 
NPs. 

Let us consider the following example: 

“To avoid data loss on devices, we should avoid storage of critical information on them”. 

Let’s take a look for possible outcome of our algorithm using the Mitkov approach 


( data, | \ 


without any syntactic restrictions: sacl | 


Chodorow: sim, (c,,c,) = —lo , where Jen is the number of edges on the 


losses, 3 


as we can see that our most probable antecedent is “data”. But if we use restriction like 
“Anaphora can not refer on co-argument” [6], [7] then algorithm cut off “data” case, and 
we can have “devices” as the most probable antecedent. 
We can use syntactic restrictions like: 

Pronoun P and noun phrase N are non-coreferential if any of the following conditions are hold: 

1.P and N have incompatible agreement features. 

2.P is in the arguments domain of N. 

3.P is the adjunct domain of N. 

4.P is an argument of head noun, N is not pronoun, and N is contained in head noun. 

5.P is in the NP domain of N. 

6.P is determiner of a noun Q, and N is contained in Q. 


Main idea 


The main concept is eliciting semantic information that concern anaphora and try to 
find noun with a similar semantic context. In our algorithm we look backwards, up to five- 


108 «MckycCTBeHHbIM HHTeIIEKT» 3’2012 


Semantic Modification of the Mitkov Algorithm for Anaphora Resolution ' Nl 


six sentences, and try to find closest semantic triplet. In cases when antecedent is a 
pronoun we can substitute anaphora that we are resolving with that pronoun and find 
antecedent for new pronoun. When we will have a situation that antecedent is noun, we can 
trace back to our original pronoun. 

For instance let us consider example: 

“John likes to solve extraordinary problems. Last set seams to be quite difficult, it took 
almost all day him to solve them. His extraordinary talent gives him advantage, that’s why 
he is obsessed in solving them”. 

Let us try to solve anaphora for “them”: 
First triplet: VERB(solving) NOUN 1(them) NOUN2(he) 


Seconds triplet: VERB(took) NOUN 1 (his) NOUN2(them) 
Third triplet: VERB(solve)NOUN I (John)NOUN2(problems) 


Together with semantic approach, pronoun substitution we can come that “he” refers 
to “John” and “them” — to “problems”. 

Let us consider another example: 

“There was seen a tail of a fox. It has stolen a chicken. It was red, furry and with a white tip”. 

In this case syntax structure is identical, and it’s impossible to determine with only 
syntax rules antecedent for last “it”. With usage of semantic rules we can determine that 
“tail” can not steal something, so first “it” is reference to “fox”. Second “it” cannot be 
reference to a fox, because “tip” is not a property of a fox but of a tail. So second “it” will 
correspond to “tail”. 


Experiments 


Here are some statistics demonstrated improvements of a new algorithm over the Mitkov 
standard. 


TypeA TypeB TypeC TypeD 
Fig. 1 — Improvements of the New Algorithm 

On the graphic above the statistical information is presented: dark area is results of 
the Mitkov algorithm without any modifications, light area is the results of the Mitkov 
algorithm with modifications (semantic measures, semantic approach) on different types of 
sentences. On Y axis, there is probability that antecedent correctly matched with pronoun. 
On X axis, we can see the sets of sentences that were used to test and compare the 
improved Mitkov algorithm and the standard Mitkov algorithm. 

Type A sentences is complex sentences that can have more that one NP in them. 
Type B is sequence of sentences, maximum length is two sentences. Type C is the sequen- 
ce where maximum number of sentences is three, Type D is the sequence with the length 
of four sentences. On each type, there were about ten sentences. 

AS we can see, modification gives us improvement about 4-5%, on every type of a sentence. 


Conclusions 


Analysis of existing algorithms has convinced us that it’s impossible to solve anaphora 
resolution problem just by syntax means. Semantic rules are needed so that it’s possible to 
determine which of candidates to antecedents is closer to words — context neighbours of 
anaphora. Semantic rules were created so and added to the modern Mitkov algorithm. 
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In current realization we were able to improve rule-based algorithm and we have provided 
evidence with statistical data. Using of semantic approach allows us to solve some cases when in 
syntax approach we have to use priority of rules that was determined in empirical way. With usage 
of semantic rules we can determine antecedent more precisely. 

Usage of semantic distance metrics that have been constructed on the base of global 
ontology networks can give some improvements to procedure of context linkage of 
candidate to antecedent to anaphora place. 
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Semantic Modification of the Mitkov Algorithm 


for Anaphora Resolution 

The article is dedicated to modern algorithm of pronominal anaphora resolution. Anaphora 
resolution should be considered in a wider range of problems related with language ambiguity 
resolution, for instance: entity resolution, reference analysis and in general case, of course, semantic 
analysis of natural language text. Analysis of existing algorithms has convinced us that it’s 
impossible to solve anaphora resolution problem just by syntax means. We can render conclusion 
from stated above that anaphora resolution is possible only on semantic level of natural language 
analysis. The main purpose of this work is development of semantic heuristics for finding the most 
probable antecedent corresponding to anaphora with analysis of sentence context. 

Semantic rules are needed so that it’s possible to determine which of candidates to 
antecedents is closer to words - context neighbours of anaphora. Semantic rules were created so 
and added to modern Mitkov algorithm. 

In current realization we were able to improve rule-based algorithm and we have provided 

evidence with statistical data. Using semantic approach allows us to solve some cases when in 
syntax approach we have to use priority of rules that was determined in empirical way. With usage 
of semantic rules we can determine antecedent more precisely. 
Usage of semantic distance metrics that have been constructed on the base of global ontology 
networks can give some improvements to procedure of context linkage of candidate to antecedent 
to anaphora place. The proposed algorithm gives about 5% improvements in comparison to the 
standard Mitkov algorithm. 
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